Help files
This page contains the help-files for each individual script in the pipeline. The help-files describes the main function of the script, the inputs and outputs. MGFunc Main Script MGFunc.v2.py This main script is the command line tool that allows to use MGFunc as a single automated pipeline. usage: MGFunc.v2.py -h genecatalog gclist -d datfile datfasta dattab config procs -v Description: MGFunc is a functional and taxonomical annotation tool for large metagenomic data sets. optional arguments: -h, --help show this help message and exit -g genecatalog Fasta file of the gene catalog -gl gclist A file with list of fasta files of the gene catalog, each file represents a sample, must be in the same folder with the gene catalog -d datfile Specify name of uniprot.dat file. Include path, if it is not in current directory. -df datfasta Specify name of uniprot.fasta file. Include path, if it is not in current directory. If it is not found and names are not specified in configuration, swiss2fasta.py will run automatically. -dt dattab Specify name of uniprot.dat.tab file. Include path, if it is not in current directory. If it is not found and names are not specified in configuration, swiss2tab.py will run automatically. -c config configuration FILE. Make sure that the file extention is .ini -p procs maximum processors -v, --verbose increase output verbosity Written by Kosai+Asli, MAR 2013. Last modified AUG 2014. Module 1 - Preparation Module 1 Documentation Utils.py Utils.py doesn't have a help-file. It's just a collection of helpful functions that are collected in one Python file. swiss2tab.py usage: swiss2tab.py -i -o Desctription: Extracts AC,ID,DE,GN,Taxonomy,AC(cession),Organism,ncbi_taxID ,GO-term,KEGG-id from STOCKHOLM-formatted file and converts it to tabular- format optional arguments: -h, --help show this help message and exit -i database STOCKHOLM-formatted database -o OUTPUT NAME output-name, put the whole output name, fx '-o uniprot.dat.tab' -v Verbose. Prints out progress and details to stdout output. Write "-v" with no arguments in commandline. Default is off. Example: python2.7 swiss2tab.py -i uniprot_sprot.dat -o uniprot_sprot.tab Written by Kosai+Asli, OCT 2013. Last modified MAY 2014. swiss2fasta.py usage: swiss2fasta.py -i -o Desctription: Extracts sequence and ID's from STOCKHOLM-formatted file and converts it to fasta-format optional arguments: -h, --help show this help message and exit -i database STOCKHOLM-formatted database (UNIPROT) -o OUTPUT NAME output-name, fasta -v Verbose. Prints out progress and details to stdout output. Write "-v" with no arguments in commandline. Default is off. Example: python2.7 swiss2fasta.py -i uniprot_sprot.dat -o uniprot_sprot.fasta Written by Kosai+Asli, OCT 2013. Last modified MAR 2014. Module 2 - Homology Search Module 2 Documentation bldecide.py usage: bldecide.py -h bldecidein bltabin blin output [blast|psiblast] lengths [savenew|nosave] INT INT FLOAT -v ----------------------------------------------------------------------------- example usage: bldecide.py -it test.blasttab -l test.lengths -v bldecide.py -id test.blastdecide -s 30 -q 30 -e 0.0001 -sn nosave ----------------------------------------------------------------------------- description: This script parses blast/ublast results and filters them based on the given cut-offs. Blast results should be in -m 0 format or tab separated -m 6 format. With ublast, the results should be obtained with -blast6out option. optional arguments: -h, --help show this help message and exit -id bldecidein pre-made blastdecide result FILE as an input back again -it bltabin blast tabular result FILE as an input -ib blin blast/psi-blast -m 0 result FILE as an input -o output Output FILE name (default=inputfile.blastdecide) -bf psiblast blast -m 0 output file format (default=blast) -l lengths Query lengths FILE (required if tabular blast result input(-it) is given) -n nosave save new blastdecide or not (default=savenew) -s INT minimum similarity cutoff -q INT minimum query coverage cutoff -e FLOAT evalue cutoff i.e. 1e-5 (default=1e-10), decimals allowed i.e. 0.0001 -v, --verbose increase output verbosity Author: Asli I. Ozen (asli@cbs.dtu.dk) Module 3 - Clustering Module 3 Documentation clustergenes.py usage: clustergenes.py -h -p preblast outputname INT [reciprocal|tri] -a -ob -v description: This script clusters genes from a given blastdecide file using reciprocal best hits or triple-hits graphs. optional arguments: -h, --help show this help message and exit -p preblast output from blastdecide as input FILE (required) -o outputname output FILE basename. results will be saved as 'basename.extention'. Path name can be included -b INT number of besthits to use in reciprocal graph(default=3) -m tri All Reciprocal best-hits or min triplet reciprocal best hits method (default=reciprocal) -a, --allhits allhits will be used in reciprocal graph, not only besthits -ob if option given, outputs only besthit for each gene, clusters are not generated -v, --verbose increase output verbosity Author: Asli I. Ozen (asli@cbs.dtu.dk) clusterFilter.py usage: clusterFilter.py -i cluster.txt -n 4 (-k k3mhn/K3MHN) Takes a cluster, and selects the clusters that have a specific user-specified number of koala/samples in them. optional arguments: -h, --help show this help message and exit -i [Cluster-name(s) ...] Cluster-file(s) -n number [sample number ...] Number of different samples's in each cluster -k [sample-names ...] Sample ID's to be selected. If your samples are called S1_x.gene and S2_x.gene, example: '-k HKM' -m [gene-number ...] Number of minimum genes per cluster -o [Output ...] Output name -p operator Logical operator, AND (&,AND,A) or OR (|,OR,O). -v Verbose. Prints out progress and details to stdout output. Write "-v" with no arguments in commandline. Default is off. Written by Kosai+Asli, oct 2013. Last modified mar 2014. Module 4 - Cluster Annotation Module 4 Documentation parseclinfo.py usage: parseclinfo.py -h -c clusterfile -b [blastdecide ...] -ut uniprottab uniprotindex -g allgeneids uniprotindex -bh -na expand -i T outputbase -v description: This script identifies the uniprot hits for each cluster. If the option -bh is given, only outputs best hit for a gene in a cluster. First an index file for the genes_vs_database.blastdecide file is created in order to fasten the process. optional arguments: -h, --help show this help message and exit -c clusterfile cluster file -b [blastdecide ...] blastdecide file -ut uniprottab uniprot tab formatted data file -ui uniprotindex uniprot tab index data file -g allgeneids Allgeneids used in the study -uif uniprotindex folder name uniprot tab index data files. Files must end with '.tab.index' -bh, --besthit If chosen only best uniprot hits will be output -na, --addna Include results for unknown genes -e expand Include the actual size of each gene from precluster file (homology reduction) -i, --index If chosen, blastdecide file is indexed first. -t T Thread limit -o outputbase output FILE basename. results will be saved as 'basename.extention'. Path name can be included -v, --verbose increase output verbosity Author: Asli I. Ozen (asli@cbs.dtu.dk) and Kosai Iskold(kosai@cbs.dtu.dk) goawrapper.py usage: goawrapper.py -h I A P S O T -c -v description: This script runs the Goatools gene enrichment analysis on the given cluster file using the Assosiation and Population files. optional arguments: -h, --help show this help message and exit -i I Your cluster-file with GO-terms. (Obligatory). -a A Association file. A file with gene names and their GO terms in a tab separated format -p P Population file. If you have an association file, population is generally the first column -s S Relative location or abs. path for the find_enrichment.py -o O Name of output file base -t T Thread limit -c, --cleanup cleanup individual enrichment results -v, --verbose increase output verbosity Author: Asli I. Ozen (asli@cbs.dtu.dk) and Kosai Iskold(kosai@cbs.dtu.dk) Module 5 - Sequence Extraction Module 5 Documentation cluster2fasta.py usage: cluster2fasta.py -c mycluster.txt -o mycluster.output -num uniprot.index/uniprot.index.p -uf uniprot.fasta SAMPLE.index/SAMPLE.index.p -kf SAMPLE.fasta optional arguments: -h, --help show this help message and exit -ui [uniprot_index_file ...] Uniprot index file -uf [uniprot_fasta ...] Fasta-file for all uniprot (from swiss2fasta) -ki sample_index_file Genecatalog index file -kf sample_fasta Fasta-file for all genecatalog sequences -sfi sample_list A list of genecatalog index files and fasta files -c Cluster-name Cluster-file -o Output Output name -num Adds 2 coloumns to a new file, with cluster_id's, number of sample-genes and number of uniprot ID's -v Verbose. Prints out progress and details to stdout output. Write "-v" with no arguments in commandline. Default is off. Written by Kosai+Asli, oct 2013. Last modified apr 2014. splitclusters.py usage: splitclusters.py -h -i infasta clusters ofold -v description: This is a small utility script for splitting of sequences for separate clusters. Clusters.fasta file must be already processed and the headers should be accordingly. optional arguments: -h, --help show this help message and exit -i infasta Input FASTA FILE (required) -cl clusters input list of cluster ids to be separated from the fasta file of all clusters (required for -s option) -o ofold Output folder name -v, --verbose increase output verbosity, prints details, warnings etc. Author: Asli I. Ozen (asli@cbs.dtu.dk) changeheaders.py usage: changeheaders.py -h -i infasta newfasta headertable -v description: This is a small utility script for changing of headers of the fasta files to unique and short names for the consequent alignment. changeheader can be used with any fasta file, separately. optional arguments: -h, --help show this help message and exit -i infasta Input FASTA FILE (required) -n newfasta output name for new FASTA FILE (default = inputfile.new.fsa) -ht headertable output header conversion table FILE name (default = inputfile.headertable) -v, --verbose increase output verbosity, prints details, warnings etc. Author: Asli I. Ozen (asli@cbs.dtu.dk) Module 6 - Phylogenetic Trees Module 6 Documentation treenames.py usage: treenames2.py -i header-dict -t mytree.dnd -format phyloxml -v optional arguments: -h, --help show this help message and exit -i [I ...] Header-change file(s) -t [T ...] tree file(s) -u U tab-formatted Uniprotfile -n N Node-names, can be 'D' (Description/function), 'G', (Gene name, DEFAULT), 'T', (taxonomy), or 'A', (All, tax,descript,genename). -o O Output-name. If not specified, working-dir will be used -format FORMAT format, can be 'phyloxml','newick',etc. -p prints trees out in STDOUT -c Deletes all temporary files -v prints date and time out Written by Kosai+Asli, nov 2013. Last modified mar 2014.